On the robustness of gender differences in economic behavior

Because of the importance of economic decisions, researchers have looked into what factors influence them. Gender has received a lot of attention for explaining differences in behavior. But how much can be associated with gender, and how much with an individual’s biological sex? We run an experimental online study with cis- and transgender participants that (1) looks into correlational differences between gender and sex for competitiveness, risk-taking, and altruism by comparing decisions across these different subject groups. (2) we prime participants with either a masculine or feminine gender identity to examine causal gender effects on behavior. We hypothesize that if gender is indeed a primary factor for decision-making, (i) individuals of the same gender (but different sex) make similar decisions, and (ii) gender priming changes behavior. Based on 780 observations, we conclude that the role of gender (and sex) is not as decisive for economic behavior as originally thought.

2 Priming (Part 1) Figure 1 presents the number of marked words split up by treatments and subject groups. We do not find any differences in marked words within one priming condition across subject groups (KW; NEUTRAL: p = 0.349; FEMININE: p = 0.874; MASCULINE: p = 0.112). For the different subject groups separately across priming conditions, only the number of words marked by transmen didn't differ across priming conditions (KW; cismen: p < 0.000; ciswomen: p = 0.038; transmen: p = 0.123; transwomen: p = 0.014). Concerning gender differences, we do not see significant variations (MWU; p = 0.675). The same is true for sex difference (MWU; p < 0.060). As we did not pre-register to control for the number of words marked in our regressions, we do not add this variable in the reported analysis. However, please note that all main results remain qualitatively the same when we account for the heterogeneity in the number of marked words. The additional analyses are available on request. Supplementary Figure 1 Marked words in Part 1 by treatments and subject groups in alphabetical order (n = 780). The bars show the average amount of marked words, and the error bars represent the standard error of the mean.

8/88
Supplementary Table 6 Words found in the priming task across treatments and subject groups. 3 Performance in the real effort math task (Part 2, 3, and 4) The following tables summarize the performance in the math task by treatment and subject groups for Part 2 (Table 7) and Part 3 (Table 8). By treatments, ciswomen and cismen have differences in performance in MASCULINE in Part 2 (MWU; NEUTRAL: p = 0.080, FEMININE: p = 0.205, MASCULINE: p = 0.037) and across all treatments in Part 3 (MWU; NEUTRAL: p = 0.004, Part 3 FEMININE: p = 0.010, Part 3 MASCULINE: p = 0.028). Also, the paper 1 (which set up the online version of this math task) does find gender differences in performance. Thomas Buser states in a personal communication that women "perform significantly worse". In their paper, this is true for their first-round (our Part 2), where "women score 1.3 fewer correct answers" than men (male average: 10.0). Moreover, in their second-round (our Part 4), males score on average 10.0, but women score 0.7 fewer correct answers.
Please note that we can not exclude that the math task is not influenced by a participant's gender and sex, combinations of it, in addition to interactions with priming. However, we can control how performance heterogeneity affects competitiveness by adding individual performances to our regressions measuring competitiveness. See Table 15 to Table 16.  Note: The beliefs in Part 3 are the participants belief about how their performance ranks within the group (1 = best to 4 = worst). Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : FEMININE on Ciswomen" tests the effect of the treatment (FEMININE) on the subject group (Ciswomen).

Non-parametric tests
Supplementary Note: Competition is a binary variable equal to 1 if the participant enters the tournament in Part 4 and 0 otherwise. Delta perf. is the difference in performance between Part 3 (tournament) and Part 2 (piece-rate). Belief tournament is the participants' belief of their performance rank within their group in Part 3, where the value 1 represents the rank with the highest performance. Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05.

20/88
Supplementary Note: Competition is a binary variable equal to 1 if the participant enters the tournament in Part 4 and 0 otherwise. Delta perf. is the difference in performance between Part 3 (tournament) and Part 2 (piece-rate). Belief tournament is the participants' belief of their performance rank within their group in Part 3, where the value 1 represents the rank with the highest performance. Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a nonstudent, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : FEMININE on Ciswomen" tests the effect of the treatment (FEMININE) on the subject group (Ciswomen).

Cohen's d
Supplementary  Note: Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : FEMININE on Ciswomen" tests the effect of the treatment (FEMININE) on the subject group (Ciswomen).        Note: Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : FEMININE on Ciswomen" tests the effect of the treatment (FEMININE) on the subject group (Ciswomen). and Part 2 (piece-rate). Belief tournament is the participants' belief of their performance rank within their group in Part 3, where the value 1 represents the rank with the highest performance. Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious person, who earns more than 20K GBP, and lives in continental Europe. *** p < 0.001; ** p < 0.01; * p < 0.05.

35/88
Supplementary Note: Competition is a binary variable equal to 1 if the participant enters the tournament in Part 4 and 0 otherwise. Delta perf. is the difference in performance between Part 3 (tournament) and Part 2 (piece-rate). Belief tournament is the participants' belief of their performance rank within their group in Part 3, where the value 1 represents the rank with the highest performance. Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a nonstudent, non-religious person, who earns more than 20K GBP, and lives in continental Europe. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : FEMININE on BEM score: Feminine" tests the effect of the treatment (FEMININE) on the subject group (BEM score: Feminine).

Risk
Supplementary Note: Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious person, who earns more than 20K GBP, and lives in continental Europe. *** p < 0.001; ** p < 0.01; * p < 0.05.

37/88
Supplementary Note: Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious person, who earns more than 20K GBP, and lives in continental Europe. *** p < 0.001; ** p < 0.01; * p < 0.05.

39/88
Supplementary Note: Competition is a binary variable equal to 1 if the participant enters the tournament in Part 4 and 0 otherwise. Delta perf. is the difference in performance between Part 3 (tournament) and Part 2 (piece-rate). In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : Rem. feminine words on Ciswomen" tests the effect of Rem. feminine words on the subject group (Ciswomen).

Risk
Supplementary Note: Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : Rem. feminine words on Ciswomen" tests the effect of Rem. feminine words on the subject group (Ciswomen).

Altruism
Supplementary Note: Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : Rem. feminine words on Ciswomen" tests the effect of Rem. feminine words on the subject group (Ciswomen).

Competitiveness
Supplementary Note: Competition is a binary variable equal to 1 if the participant enters the tournament in Part 4 and 0 otherwise. Delta perf. is the difference in performance between Part 3 (tournament) and Part 2 (piece-rate). Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : FEMININE on Ciswomen" tests the effect of the treatment (FEMININE) on the subject group (Ciswomen).

Risk
Supplementary Note: Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : FEMININE on Ciswomen" tests the effect of the treatment (FEMININE) on the subject group (Ciswomen).

Altruism
Supplementary Note: Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : FEMININE on Ciswomen" tests the effect of the treatment (FEMININE) on the subject group (Ciswomen).

Competitiveness
Supplementary Note: Competition is a binary variable equal to 1 if the participant enters the tournament in Part 4 and 0 otherwise. Delta perf. is the difference in performance between Part 3 (tournament) and Part 2 (piece-rate). Belief tournament is the participants' belief of their performance rank within their group in Part 3, where the value 1 represents the rank with the highest performance. Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : FEMININE on Ciswomen" tests the effect of the treatment (FEMININE) on the subject group (Ciswomen).

Risk
Supplementary Note: Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : FEMININE on Ciswomen" tests the effect of the treatment (FEMININE) on the subject group (Ciswomen).

Altruism
Supplementary Note: Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious cisman, who earns more than 20K GBP, and lives in continental Europe. In the last column from the right, the baseline is a cisman. *** p < 0.001; ** p < 0.01; * p < 0.05. Rows starting with H 0 report the p-values of a joint coefficient test that the coefficients' sum equals 0. For example, "H 0 : FEMININE on Ciswomen" tests the effect of the treatment (FEMININE) on the subject group (Ciswomen). Note: Competition is a binary variable equal to 1 if the participant enters the tournament in Part 4 and 0 otherwise. Delta perf. is the difference in performance between Part 3 (tournament) and Part 2 (piecerate). Belief tournament is the participants' belief of their performance rank within their group in Part 3, where the value 1 represents the rank with the highest performance. Gender congruent upbringing is a binary variable equal to 1 if the way the participant's parents treated the participant matches the reported gender of the participant or the parents treated their child neutrally. Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious person, who earns more than 20K GBP, and lives in continental Europe. *** p < 0.001; ** p < 0.01; * p < 0.05.

50/88
Supplementary Note: Gender congruent upbringing is a binary variable equal to 1 if the way the participant's parents treated the participant matches the reported gender of the participant or the parents treated their child neutrally. Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious person, who earns more than 20K GBP, and lives in continental Europe. *** p < 0.001; ** p < 0.01; * p < 0.05.

52/88
Supplementary Note: Gender congruent upbringing is a binary variable equal to 1 if the way the participant's parents treated the participant matches the reported gender of the participant or the parents treated their child neutrally. Standard errors in parentheses are heteroskedasticity robust. In the second last column from the right, the baseline is a non-student, non-religious person, who earns more than 20K GBP, and lives in continental Europe. *** p < 0.001; ** p < 0.01; * p < 0.05.

55/88
14 Detailed literature summary 14.1 Competitiveness Differences in competitiveness have become an essential explanation for labor market outcomes like variations in wages 2 , and different demands in wage negotiations 3 . Pinning down the causes and consequences of the willingness to compete is important as it correlates with several relevant choices and characteristics for education and labor market outcomes 4 . For example, subjects who are more competitive have been found to be more likely to choose competitive educational programs 1,5-7 , to have a higher income 8-10 and to become entrepreneurs 11 . But what role does one of the main human characteristics -being a man or a woman -play for competitiveness?
During the last decades, an impressive amount of scientific evidence showed that women are generally less competitive than men [12][13][14][15][16][17][18] . This gender gap in competitiveness (henceforth GGC) is robust when using different scientific methods. Studies report that men are more likely to compete when using classical lab 18 , lab-in-the-field 19 , field 20 , and online experiments 1 . The findings also replicate when using subjects from different age groups like children 16 , students 18 , and non-students 21 .
Recently some evidence has been collected on the lack of a GGC in certain circumstances. For example, for the matriarchy of Masai in Kenya, adult women are reported to be even more competitive than men 19 . Similarly, children living in the Khasi matrilineal society in northeast India are equally competitive 21 . Without the need to go afar, it has been shown that the type of school children attend influences competitiveness with female students from girl's schools being as competitive as boys 22 . Moreover, for children from families with lower socioeconomic backgrounds, no GGC is reported 15 . Also, cultural differences play a role in competitiveness, as shown by 23 . They found that children are equally competitive in Columbia, but boys in Sweden are more competitive than girls. These mentioned studies suggest that women's lower willingness to compete is not something that they are born with, but rather a behavioral preference that can be influenced by different factors and can thus be addressed to nurture rather than nature. Support for this perspective is provided by research showing that the GGC can be closed or reversed when using interventions, which do not influence participants' biological makeup. For example, some studies change the institutional environment to resemble different affirmative action policies and obtain gender balance in competitive environments 18,[24][25][26] . Others use the easy-to-implement intervention of priming ( 27 and 28 ) which encourages women to enter competitions more often. Moreover, giving feedback about relative performance 29 and the earnings implications related to competition avoidance 30 successfully increases women's entry rates, as well as when more experienced people advise strong-performing women to compete 31 . Besides, when the price of the competition benefits not the participants themselves, but their offspring, again no GGC has been observed 32 .
However, it is also plausible that biological factors like genes and hormones may lead to different decisions of women and men and are also a primary driver of behavior. Thus, a new and still developing field of research focuses on competitiveness from a more elementary perspective by taking hormones into account. Up to now, there is only one study by 33 , which causally analyses the effect of estrogen and progestin (by administrating oral contraceptives) on competitiveness. The authors find no impact of the two hormones on the willingness to compete. All other studies use self-reported hormonal measures by asking female participants about their menstrual cycle day and taking hormonal contraceptives to infer their hormonal level. Using self-reports is noisy (for a detailed discussion why this is the case see, 34 ) and leads to mixed findings whether hormones play a role for competitiveness or not 8,35 .
The existing evidence already provides results on what factors correlate with competitive behavior and how differences in competitiveness between men and women can be closed. However, this paper will be the first to test the robustness of the GGC when priming subjects with a specific gender identity. Moreover, we contribute to the literature by investigating the willingness to compete of transgender subjects. To the best of our knowledge, no economic experiments have been done using transgender participants. According to our review of the literature, considering the behavior of LGBTQ+ individuals is extremely rare in experimental economics. We only found one paper on homosexuality and competitiveness by 8 . These aspects point out our study's potential to expand the knowledge in the domain of competitive behavior.

Risk
Risk-taking is considered a fundamental determinant of individual behavior in different domains like health 36,37 , stock market participation 38 , saving decisions 39 , occupational and self-employment choices 40 , personal and household finance 41,42 , education 43 and environmental decision making 44 . One strand of the literature in Behavioral Economics reports seemingly strong evidence for women preferring to take less risk compared to men 45 . This difference in risk-taking is robust when using different experimental methods to measure risk, such as lotteries 46 , investment games 47 or card games 48 . It is also reported for subjects varying from children 23 , to students 49 , to non-students 50 .

56/88
Moreover, the difference is not influenced by conducting the experiment in the lab or in other environments like on online platforms 50 .
Another strand of the literature does not support that risk-taking differs by gender. Those papers mainly concentrate on different underlying methodologies than those used by the studies mentioned above. First, they claim that it is important to clearly distinguish between differences on the individual level (categorical differences between men and women) and patterns that appear only at the aggregate level (such as, e.g., statistically detectable different means) 51 . Second, using quantitative measures of substantive differences that are not yet that common in economic studies (such as Cohen's d) or measures of substantive overlap (like, e.g., the Index of Similarity) also results in not having a substantially large gender gap in risk-taking 51,52 . 52 , e.g., claims that standardized differences in means across gender mostly amount to less than one standard deviation, and that the degree of overlap in distributions of risk-taking behavior of men and women is generally exceeding 80%. On-average differences between (cis-) men and women in behavior are smaller than sex differences in, e.g., height or throwing ability 53 and pale next to the effects of aspects such as cultural manipulations or gender priming (e.g., 52 ). These papers align with the so-called gender similarities hypothesis from the psychological literature which argues that males and females are similar on most, but not all, psychological dimensions 53 . 54 claims that one explanation for gender differences in risk-taking still being such a prominently repeated finding is that science is biased towards these results because of, e.g., exiting stereotypes or confirmation bias for existing publications.
Several studies analyze gender differences in risk preferences for sub-populations of managers 49,55,56 and find that females are similar or even less risk-averse than men. The reasons could be a selection or social learning and adaptive behavior to the job demands. To disentangle these different factors, 57 uses an online experiment with scientists. They vary the salience of either the private or the professional identity of the subjects. They report that priming the professional identity reduces the gender gap in risk-taking. Besides, the gender gap decreases with increasing age as female senior scientists choose riskier options in the treatment where the profession is made salient.
Also, attempts to explore the connection between biological factors and risk-taking are taken for the domain of risky behavior. First, studies are exploring the causal effect of hormones on behavior. 1 For example, 58 test for administered testosterone or estrogen affecting women's risk-taking. No effect of either testosterone or estrogen on risk-taking could be detected. In line, the study by 59 and 60 find no effect of testosterone on risk aversion. 33 take a comparative approach and administer an oral contraceptive or not. Again, no connection between hormones and behavior is reported. Second, studies test for the correlation between the variation in risk-taking and genes. On the one hand, for example, 61 find no relationship between the dopamine and the serotonin gene and risk-taking. On the other hand, studies using, for example, the twin methodology and genome-wide association techniques (GWAS) report genetic foundations for the willingness to take risk [62][63][64] . Third, a recent study by 65 showed that the intake of a small dose of Acetaminophen, a very popular pain killer, increases risk-taking.
Several researchers prime subjects and study the effect on risk-taking [66][67][68][69][70] . The study closest to our research is 71 which finds that making the subject's gender salient with a short questionnaire does not impact risk preferences. Also, 72 report an effect of gender priming through questions and stereotypical pictures only on male risk preferences. 73 prime financial professionals with their professional salience, which leads to a decrease in risk-taking in a high stakes investment game. With a similar subject pool, 74 find that individuals primed with a bust scenario are more risk-averse compared to those primed with a boom scenario. 75 test the robustness of the results of 74 with an Amazon Mechanical Turk subject pool. They report no evidence of priming influencing risk-taking. 76 primed individuals who were exposed to violence by asking them to either recall happy, fearful or neutral moments. They find that remembering frightening experiences leads to a higher preference for certainty.
The only related study we are aware of that investigates the risk-taking behavior of LGBTQ+ individuals is 8 . It analyzes risk preferences by asking the subjects about their risk perception (survey question). It finds no significant differences between homosexual and heterosexual men and homosexual and heterosexual women.

Altruism
To what extend someone is pro-social, i.e., altruistic, is argued to explain behavior in the labor market, how individuals vote, if they take up volunteer work or not, and how willing someone is to give to a charity 77 . Altruistic behavior is typically measured with a dictator game, where participants are asked how much they want to transfer to an anonymous other participant 78,79 , or how much they wish to donate to a charity 80 . It is a robust finding that participants in experiments transfer quite a substantial part of their endowment in dictator games, thus act altruisticly 81 . The literature reports mixed findings on the external validity of those experiments. One strand of the literature finds that individuals behave in donation experiments similar as in naturally occurring decision situations on charitable giving 82,83 . Other research contradicts these findings, as recently summarized by 84 .
Concerning the level of altruism exhibited by men and women, a wide range of studies shows that women are generally more generous in dictator games. See, e.g., 77 for an up-to-date meta-analysis of the existing literature on gender differences in charitable giving. These authors report that the magnitude of the gender differences in altruism is sensible to the experimental context. For example, the difference is more prominent if the dictator decides to donate to a charity than giving to an anonymous recipient. However, the difference is more minor if the dictator chooses between giving all or nothing compared to deciding on a continuous scale.
Turning to studies attempting to link hormones to altruism causally, 60 and 85 found no impact of administered testosterone on dictators' giving. 58 used another approach and administered testosterone, estrogen, or a placebo to the experimental participants. Again, no connection between either hormone or altruism is reported. Moreover, administering an oral contraceptive containing synthetic progesterone as the main ingredient suggests no hormonal impact on altruism levels. However, there is evidence that the underlying genes influence altruism. See for example 86 who used twins for their study.
Multiple studies explore if different priming influences altruistic behavior. For example, subsequent donations are affected by religious primes [87][88][89][90] , by reminding subjects of secular, moral institutions 88 , and by priming with subtle cues of observability [91][92][93] . 94 report an increased gender gap in altruism when making gender more salient by requiring participants to specify their gender before the dictator game and informing them about the gender of the recipient. Again, we have found no published studies of altruism of LGBTQ+ individuals in economics.